A Random Forest Turbulence Prediction Algorithm
نویسندگان
چکیده
Unlike traditional pilot reports, in-situ EDR reports of atmospheric turbulence from commercial aircraft contain both positive and negative instances, are reported regularly, and have relatively accurate positions and timestamps. These data therefore make it feasible to perform more sophisticated analyses of the causes of atmospheric turbulence than were formerly possible. Several real-time gridded products derived from satellite, radar and numerical weather model data that represent storm location and intensity currently exist. These include quantities such as vertically integrated liquid (VIL), echo tops, and wind direction and velocity. In this paper, the authors present a machine-learning algorithm that predicts in-situ peak EDR based primarily on the values of VIL and echo tops in a spatial neighborhood extending approximately 300km around the measurement point. To summarize the values of the gridded products associated with each in-situ EDR measurement, a set of quantities including distances to grid points with data over certain thresholds, maximum data values within each sub-region and the proportion of grid points over various thresholds within each sub-region were computed. A set of the most useful features for turbulence prediction was then determined using a large-scale automated feature selection algorithm. First, an estimate of the "quality” of each candidate feature was calculated by training a large number of decision trees on small random subsets of candidate features and comparing their performance on a testing set both with and without the feature in question. Then, a linear programming problem was formulated in which the "best" subset of features was chosen under the constraint that no two selected features for a given data source could overlap. The selected features and a large training ∗Corresponding author address: Andrew Cotter, Toyota Technological Institute at Chicago, University Press Building, Second Floor, 1427 East 60th Street, Chicago, IL 60637; e-mail: [email protected] set were then used to train a random forest as a predictive algorithm. Finally, the performance of the random forest on an independent testing set was evaluated and compared to another turbulence-prediction product.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملارزیابی صحت پیشبینی ژنومی در معماریهای مختلف ژنومی صفات کمی و آستانهای با جانهی دادههای ژنومی شبیهسازیشده، توسط روش جنگل تصادفی
Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...
متن کاملUnderstanding Severe Weather Processes through Spatiotemporal Relational Random Forests
Major severe weather events can cause a significant loss of life and property. We seek to revolutionize our understanding of and ability to predict such events through the mining of severe weather data. Because weather is inherently a spatiotemporal phenomenon, mining such data requires a model capable of representing and reasoning about complex spatiotemporal dynamics, including temporally and...
متن کاملImplementation of Random Forest Algorithm in Order to Use Big Data to Improve Real-Time Traffic Monitoring and Safety
Nowadays the active traffic management is enabled for better performance due to the nature of the real-time large data in transportation system. With the advancement of large data, monitoring and improving the traffic safety transformed into necessity in the form of actively and appropriately. Per-formance efficiency and traffic safety are considered as an im-portant element in measuring the pe...
متن کاملFeature-Budgeted Random Forest
We seek decision rules for prediction-time cost reduction, where complete data is available for training, but during prediction-time, each feature can only be acquired for an additional cost. We propose a novel random forest algorithm to minimize prediction error for a user-specified average feature acquisition budget. While random forests yield strong generalization performance, they do not ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006